Task placement of parallel multi-dimensional FFTs on a mesh communication network
نویسندگان
چکیده
For many scientific applications, the Fast Fourier Transformation (FFT) of multi-dimensional data is the kernel which limits scalability to large numbers of processors. This paper investigates an extension of a traditional parallel threedimensional FFT (3D-FFT) implementation. The extension within a parallel 3D-FFT consists of customized MPI task mappings between the virtual processor grid of the algorithm and the physical hardware of a system with a mesh interconnect. Consequentially, we derived a simple model for the scope of performance of a large class of mappings on the basis of bandwidth considerations. This model enables us to identify scaling bottlenecks and hotspots of parallel, communication intensive 3D-FFT applications when MPI tasks are mapped in the default way onto the network. The predictions of the model are tested on an IBM eServer Blue Gene/L system. The results demonstrate that a carefully chosen mapping pattern with regards to the network characteristics yields significant improvement.
منابع مشابه
A decomposition method with minimum communication amount for parallelization of multi-dimensional FFTs
The fast Fourier transform (FFT) is undoubtedly an essential primitive that has been applied in various fields of science and engineering. In this paper, we present a decomposition method for parallelization of multi-dimensional FFTs with smallest communication amount for all ranges of the number of processes compared to previously proposed methods. This is achieved by two distinguishing featur...
متن کاملPerformance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer
QCDOC is a massively parallel supercomputer with tens of thousands of nodes distributed on a six-dimensional torus network. The 6D structure of the network provides the needed communication resources for many communication-intensive applications. In this paper, we present a parallel algorithm for three-dimensional Fast Fourier Transform and its implementation for a 4096-node QCDOC prototype. Tw...
متن کاملCommunication Performance of Parallel 3d Ffts Using Various Networks and Transposition Algorithms
This contribution deals with empirical investigations of the behavior of communication-time to computationtime ratios of three-dimensional parallel Fast Fourier Transforms (FFTs). Different problem sizes, number of processes, as well as different types of communication structures such as: Ethernet, Fast Ethernet, Myrinet, and IBM SP Switch are considered. Preliminary results are given on algori...
متن کاملTransposing Arrays on Multicomputers Using de Bruijn Sequences
Transposing an N × N array that is distributed rowor column-wise across P = N processors is a fundamental communication task that requires time-consuming interprocessor communication. It is the underlying communication task for the fast Fourier transform of long sequences and multi-dimensional arrays. It is also the key communication task for certain weather and climate models. A parallel trans...
متن کاملA dynamic programming algorithm for simulation of a multi-dimensional torus in a crossed cube
The torus is a popular interconnection topology and several commercial multicomputers use a torus as the basis of their communication network. Moreover, there are many parallel algorithms with torus-structured and mesh-structured task graphs have been developed. If one network can embed a mesh or torus network, the algorithms with mesh-structured or torus-structured can also be used in this net...
متن کامل